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Abstract 

In this note we evaluate the expected waiting time to complete a col- 
lection of coupons, in the case of coupons which arrives in groups of 
constant size, independently and with unequal probabilities. As an ap- 
plication we will be able to determine the expected number of samples 
of dimension g that we have to draw independently in order to observe 
all the types of individuals in a given population. 
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1 Introduction 

The coupon-collector's problem is a classical problem in combinatorial prob- 
ability. The description of the basic problem is easy: consider one person 
that collects coupons and assume that there is a finite number, say m, of 
different types of coupons. These items arrive one by one in sequence, with 
the type of the successive items being independent random variables that are 
each equal to k with probability pk ■ It is immediate to see how this descrip- 
tion can be adapted to the general problem to draw independent samples 
from a given, finite distribution. 

In the coupons-collector's problem, one is usually interested in answering 
the following questions: which is the probability to complete the collection 
(or a given subset of the collection) after the arrival of exactly n coupons 
(n > to)? which is the expected number of coupons that we need to complete 
the collection (or to complete a given subset of the collection)? how these 
probabilities and expected values change if we assume that the coupons 
arrive in groups of constant size? 

The first results, due to De Moivre, Laplace and Euler (see [6] for a compre- 
hensive introduction on this topic) , deal with the case of constant probabil- 
ities pk = while the first results on the unequal case have to be ascribed 
to Von Schelling (see [7]). Many other studies have been carry out on this 

2 



classical problem ever since (see e.g. [I], [3] and pQ). 

The aim of this note is to evaluate the expected number of coupons that one 
needs to collect in order to complete the collection, in the case of unequal 
probabilities and multiple arrival (i.e. the case in which the coupons arrives 
in groups of constant size). To the best of our knowledge this result is new 
and in the case of uniform probabilities we derive the expression present in 
the literature (see e.g. Stadje [6|) in a much easier way. Furthermore, we 
will apply this computation to the problem to sample without replacement g 
individuals from a population composed by m types of individuals, present 
in different proportions, obtaining an explicit computation of the expected 
number of independent samples of size g that we have to draw in order to 
observe all the types of individuals in the population, which could be of 
interest for other applied problems. 

2 The single arrival case 

In order to solve the problem for the multiple arrival setting, we shall start 
by the easier single arrival case and we shall see in the next section how to 
extend this result to that case. 

Let us fix the notation. We shall denote by {1, . . . , m} the different types 
of items which form the collection Let us assume that the items are pur- 
chased one by one in sequence, with the type of the successive items being 
independent random variables that are each equal to k with probability pk- 
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Since we are interested here in the number of items one needs to collect to 
complete the collection, let us define the following set of random variables: 
X\ will denote the (random) number of items that we need to collect to have 
the first coupon of our collection (which is trivially equal to 1), X2 will be 
the number of additional items that we need to collect to obtain the second 
different coupon in our collection and so on let us define, for every i < m, 
by Xi the number of items that we need to collect to pass form the i — 1-th 
to the i-th different coupon in the collection. From this classical description 
(see e.g. Rosen [1]), we obtain that the random number of coupons that we 
need to complete the collection is equal to X = X\ + . . . + X m and that 
F[X < +00] = 1. 

In the case of constant probabilities, i.e. pk = 1/m for any k 6 {1, . . . , m}, 
it is immediate to see that the random variable Xi, for i € {2, . . . , m}, has a 
geometric law with parameter (m — i) /m. The expected number of coupons 
that we need in order to complete the collection is therefore given by the 
well-known formula 

m 

E[X]=mJ2- i ■ (1) 

i=i 

When the probabilities pk are unequal, one can look at the problem from a 
slightly different angle. Let us define the following set of random variables: 
Y\ will denote the (random) number of items that we need to collect to obtain 
the first coupon of type 1, I2 the number of items that we need to collect 
to get the first coupon of type 2, and so on for the others coupons. In this 
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setting, the waiting time to complete the collection is given by the random 
variable Y = max(Yi, . . . , Y m ). In order to compute its expected value, one 
can use the Maximum-Minimums identity (see [5], p. 345), obtaining 
E[Y] = -^2E[jmn(Yi,Yj)] + £ E[mm(Yi, Y j: Y k )\ + ... 

i i<j i<j<k 

... + (-l) m+1 E[mm(Y l ,Y 2 ,...,Y m )} . 

(2) 

Since the random variables min(l^ 1 , Yi 2 , . . . , Yi k ) have a geometric law with 
parameter Pi x + Pi 2 + ■ ■ ■ + Pi k , we get the formula 

e m = E l E— +E — - — +...+(-ir +1 — - — . 

. Pi T^Pi+Pj T-i u Pi+Pj +Pk pi + ...+p m 

(3) 

The problem described above can be rephrased as follows: let us consider a 
finite distribution and let us evaluate the expected number of independent 
sample that we have to draw in order to observe all the records. The quantity 
([3]) is clearly this value. It is interesting to note that if we would like to 
evaluate the expected number of independent samples that we have to draw 
in order to observe a fixed number k of records, with k < n, the present 
approach is no more suitable, but we have to reconsider the problem from 
a slightly different point of view (see [2] for the details). 

3 The multiple arrival case 

Let us now consider the case of coupons which arrives in groups of constant 
size g, where 1 < g < m, with the types of the items in any group of coupons 
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being independent random variables. A natural requirement in this contest 
is that each group does not contain more than one coupon of any type. With 
this assumption, the total number of groups will be (™) and each group A 
can be identified with a vector (a\, . . . ,a g ) G {1, . . . ,m} 9 with ctj < Oj+i 
for i = 1, ... ,<7 — 1. Removing this assumption, we have to consider all 
the possible m 9 groups of coupons that we can obtain. In this case we will 
describe the groups of coupons as (a±, . . . , a g ) G {1, . . . , m} 9 and we will see 
at the end of this chapter how this problem applies to the case of sampling 
form a given population. 

Let us first consider the case in which each group does not contain more 
than one coupon of any type. We can order the groups according to the 
lexicographical order (i.e. A = (ai, . . . , a g ) < B = (b\, . . . , b g ) if there exists 
i G {1, . . . , g — 1} such that a s = b s for s < i and a« < hi). 

Definition 1 We shall denote by qi,i G {1, . . . , (™)} the probability to pur- 
chase (at any given time) the i-th group of coupons, accordingly to the lex- 
icographical order. Moreover, given k G {1, . . . ,m — g}, we shall denote by 
q(h, ■ ■ ■ ,ik) the probability to purchase a group of coupons which does not 
contain any of the coupons i±, . . . , ik- 

Remark 2 In order to compute the probabilities q(i\, . . . , ik) 's, one can pro- 
ceed as follows: by the defined ordering, it holds that 

o o 
?(i) = * ' st 1 ' 2 ) = * > 
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and in general 

(7) 



9(1,2,...,*) = { 



^2 Qi , ifk<m-g 



otherwise . 

For any permutation (ii, ■ ■ ■ ,i m ) of (1, ... ,m), one first reorders the qi 's 

according to the lexicographical order of this new alphabet and then compute 

o 

^2 qi , ifk<m-g 

f(il,«2, ■■■,«*) = "( i=(l-i)+-+(Ti) +1 

otherwise . 

Remark 3 There are many conceivable choices for the unequal probabili- 
ties qj 's. For example, we can assume that one forms the groups following 
the strategy of the draft lottery in the American professional sports, where 
different proportion of the different coupons are put together and we choose 
at random in sequence the coupons, discarding the eventually duplicates, up 
to obtaining a group of k coupons. Or, more simply, we can assume that 
the i-th coupon will arrive with probability p, t and that the probability of any 
group is proportional to the product of the probabilities of the single coupons 
contained. 

In order to evaluate the expected number of groups needed in order to 
complete the collection, we shall use the approach of the single arrival case. 
Let us start by considering the case of uniform probabilities, i.e. qi = j^r 
for any i. Let us define the following set of random variables: 

Vi = {number of groups to purchase to obtain the first coupon of type i] 



These random variables have a geometric law with parameter 



fm—l\ 

2 _ ^ a > 



The random variables mm(Vi,Vj) have their selves a geometric law with 
parameter 

/m— 2\ 
1 _ . 9 I 

" o 

and so on up to the random variables min(Vi 1 , . . . , Vi m _ g ), which have a 
geometric law with parameter 

The minimum of more random variables, i.e. min(Vi 1 , . . . , Vi k ) for k > m — 
g + 1, will be equal to the constant random variable 1. 
Applying the Maximum-Minimums principle, we shall obtain that the ex- 
pected number of groups of coupons that we need to complete the collection 
is equal to 

E[max(V 1 ,...,V m )] = Yl E E[nun(V$,v-)] + --- 

+ £ E[^,...,^ m _ s+1 ] + 

l<ii<i2<---<j m - g <m 

+ (_l)m-g+2 £ i + ... + (_!)"*+! 

I<ii<i2<---<j m - g +i<m 
m\ 1 / m\ 1 / m\ 1 

iA m'Wi (V) + UA ('" 



(7) (7) ' (7) 



This result, even if not obtained with this computation, is known (see e.g. 
Stadje [6], p.872). 

In the unequal case, we are able to generalize the previous result as follows: 



Proposition 4 The expected number of groups of coupons that we need to 
complete the collection, in the case of unequal probabilities qi, is equal to 

i 1 " " iJL i"^') + o<M<m 1 -9(^>0 + " ' 

••• + ( " ir " 9+1 E 1-qU, 1 i ) + ( 4 ) 

0<H<12<— <« m _ g <m a 



^ \m - q + k) 



l<fc<9 

Proof: Let us define, as before, the set of random variables Vi,...,V m , 
where Vi denotes the (random) number of groups of coupons that we need 
to collect in order to obtain for the first time the i-th coupon. It is im- 
mediate to see that the random variables V*'s have now a geometric law 
with parameter 1 — q(i). Similarly, the random variables mxn(Vi,Vj) have 
a geometric law with parameter 1 — q(i,j) and so on up to the random 
variables min(Vi 1 , . . . , Vi m _ ), which have a geometric law with parameter 
1 — q(ii, . . . , i m - g ). As in the uniform case, the minimum of more random 
variables Vi, i.e. min(V r j 1 , . . . , Vi k ) for k > m — g + 1, will be equal to the 
constant random variable 1. Applying the Maximum- Minimums principle, 
we obtain that the expected number of groups of coupons that we need to 
complete the collection is equal to E[max(Vi, . . . , V m )], which is equal to (j3|). 



□ 

Let us now assume that a group of coupons could contain more copies 
of the same type. Defining now = {1, . . . ,m} g and by S(ii, ... = 
{(ai, . . . , a g ) € £2 : a,j . . . , i^} for j = 1, . . . , g}, we will denote by 

q u ,u) £ f2 the probability to purchase (at any given time) the u group of 
coupons. As before, given k £ {1, ... , m}, we shall denote by q(ii, ■ ■ ■ , ik) 
the probability to purchase a group of coupons which does not contain any 
of the coupons i\, . . . , As pointed out before, the assignment of the prob- 
abilities q^ is in general not simple and most of all not unique. However, 
if we assume to draw without replacement g elements from a population 
composed by m different types of individuals which are present in different 
proportions, it is easy to compute the previous probabilities. To fix the no- 
tation, let N be the total number of individuals and Ni, . . . , N m the number 
of individuals of any given type. A simple computation gives, for example, 

where P(n, k) denotes the number of ordered sequences of k elements from 
n, and similarly, fixed i\ ^ ii ^ . . . ^ i^, 

- , 9 -^ N-N h -...-N ik -j P(iV-iV n -...iV tfc ,g) 

9(»i,...,.*)-n ^- = pyfj) — • 

(6) 

Following the same ideas as before, it is easy to extend the result of Propo- 
sition H] to the present case: 
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Proposition 5 The expected number of groups of coupons that we need to 
complete the collection, in the case of unequal probabilities q W! is equal to 

/ \ — n(i) x — a(i j) ' 1 — off i /) " ' 

l<j<m yv ; l<«<j<m yv '"^ 0<«<j<Z<m yv ' J ' ; (7) 

... + (-l) m+1 - ^ r 

1 - q(l, ...,m) 

Corollary 6 Let a population of size N be composed of m types of different 
individuals in given proportions N\/N, . . . ,N m /N. The expected number 
of independent drawn without replacement of g individual that we have to 
perform in order to observe at least once any type of individuals is equal to 

2^ , P(N-Ni,g) 1^ 1 PjN-Nj-N^g) + 

l<i<m 1 P{N,g) l<i<j<m 1 P{N,g) 

+ V - + +r-p m+1 1 

^ Z_> -, P(N-Ni-Ni-N h ,g) T " ,T ^ 



1 _ PjN-Nj-Nj-N^g) ^ ^ , _ P(N-N 1 ...-N m ,g) 

0<i<j<l<m 1 P(N,g) P(AT, S ) 



(8) 



Let us now evaluate (jHJ) for some specific choices of the relative distribution in 
the population and the size g of the single sample. First of all it is important 
to note that if one type of individuals is vary rare with respect to the others, 
the value ([8]) is very close to the expected number of samples that we have 
to draw in order to obtain one element of this type. For example, if we 
choose m = 4 and N ± = 10, N 2 = 100, N 3 = 500, N 4 = 1000, we get that the 
expected number of independent drawn of two individuals that we need in 
order to observe at least once any of the four types, is approximatively equal 
to 81.5, while the expected number of independent drawn of two individuals 
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that we need in order to observe one individual of the first type is equal to 
80.7. 

In order to see how the quantity ([8]) depends on the size g, we choose again 
m = 4 and Ni = 10, N 2 = 100, iV 3 = 500, N 4 = 1000 and we compute the 
value of ([5]) for g = 1, ... ,15. In Figure (pQ) we plot the expected numbers of 
individuals that we have to draw in order to observe at least one individual 
of any type. As one could expect, this expectation increases with g. We also 
compare these values with the case of single arrivals (solid line). 




Group size 



Figure 1: Expected number of individuals to observe at least one individual 
of any type for different group sizes (g = 1, . . . , 15), computed using ([5]) with 
m = 4 and N x = 10, N 2 = 100, N 3 = 500, iV 4 = 1000. Comparison with the 
single arrivals case (solid line) 
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On the converse, fixed g = 2, we can consider a population with an increasing 
number m of different types. 



Comparison between simulated and exact values 




5 10 15 

Number of different types 



Figure 2: Expected number of independent drawn of two individuals that we 
need in order to observe at least once any of the m types, with m = 5, . . . , 20 
(solid black circle). Computation is performed using (|8|) with proportion of 
the types in the population closed to a Mandelbrot distribution of param- 
eters c = 0.30 and 9 = 1.75. Comparison with the simulated values (filled 
red circle) 

Taking the proportion of the types in the population closed to a Mandelbrot^ 



1 The Mandelbrot distribution assumes events to be ranked according to their frequency 
of usage. The i-th most probable event has probability pi oc (c + i) ~ for some constant 
c > and 9 ranges over [f , 2] 
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distribution of parameters c = 0.30 and 9 = 1.75. Figure ([2]) shows the exact 
and the simulated values of ([8]) for increasing values of m. 

Remark 7 It is important to note that both the expressions ^ and ^ are 
computationally hard and the explicit computation of their values possible 
just for small values of m. 
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